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Language serves as a cornerstone for human cognition, 
yet much about its evolution remains puzzling. Recent 
research on this question parallels Darwin's attempt to 
explain both the unity of all species and their diversity. 
What has emerged from this research is that the unified 
nature of human language arises from a shared, species- 
specific computational ability. This ability has identifi¬ 
able correlates in the brain and has remained fixed since 
the origin of language approximately 100 thousand 
years ago. Although songbirds share with humans a 
vocal imitation learning ability, with a similar underlying 
neural organization, language is uniquely human. 

Recent developments in the study of language 

The understanding of language has progressed significant¬ 
ly in recent years and evidence regarding the neural 
correlates of human language has steadily accumulated 
[1]. The questions being investigated today could barely 
have been formulated half a century ago. A number of 
conclusions can be drawn with fair confidence from re¬ 
search in the past few decades. Human language appears 
to be a recent evolutionary development: archaeological 
evidence suggests that it arose within the past 100 000 
years [2]. So far, no equivalent to human language has 
been found in other animal species, including apes and 
songbirds [3] . However, some of the systems required for 
language, such as the production of ordered sound 
sequences, have analogues in other species, such as vo¬ 
cal-learning songbirds [3] (Box 1). Furthermore, there is 
overwhelming evidence that the capacity for language has 
not evolved in any significant way since human ancestors 
left Africa, approximately 50 000-80 000 years ago [2] . 
Although there are some individual differences in the 
capacity to acquire language, there are as yet no firmly 
established group differences (Box 2). If so, then the human 
language faculty emerged suddenly in evolutionary time 
and has not evolved since. 

Languages do change over time, but this describes 
change within a single species and is not to be conflated 
with the initial emergence of language itself. Famously, 
the 19 th century ‘Stammbaum’ (‘family tree’) grammarians 
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were the first to articulate a view of human language 
relationships grounded on the reconstruction of ancestral 
language forms by collating sound changes among seman¬ 
tically similar (‘cognate’) words, for instance, ‘two’, ‘duo’, 
‘zwei’, arriving at a phylogeny for all Indo-European lan¬ 
guages [4]. This view inspired Darwin himself to note 
parallels between language and species ‘family trees’ 
([5], p. 422-423). More recently, computational tools drawn 
from modern evolutionary biology and phylogenetics have 
been applied to language in an attempt to trace the spread 
of language diversity and pinpoint the times at which 
various languages diverged from one another, with some 
success [6-9]. For example, the frequency of word use 
seems to follow a clear pattern of ‘descent with modifica¬ 
tion’, mirroring Darwinian natural selection [9]. Other 
researchers [10], following the seminal work of Cavalli- 
Sforza [11], have begun to address the seemingly micro¬ 
scopically detailed variation that occurs from one language 
variant to another, even when in close geographic contact, 
aligning this with genetic variation. 

However, other researchers have sounded cautionary 
notes regarding the validity of biological models of lan¬ 
guage variation because it can be difficult to ensure that 
biological model assumptions can be carried over intact 
into linguistic domains [12]. For example, the shared 

Glossary 

Context-free language: a language (set of sentences) generated by a context- 
free grammar, namely, a grammar whose rules are all restricted to be in the 
form X —► w, where X is a single phrase name (such as VP or NP), and w is 
some string of phrase names or words. 

Externalization: the mapping from internal linguistic representations to their 
ordered output form, either spoken or manually gestured. 

Internalization: the computations that construct mental syntactic and con¬ 
ceptual-intentional representations internal to the mind/brain. 

Merge: in human language, the computational mechanism that constructs new 
syntactic objects Z (e.g., 'ate the apples') from already-constructed syntactic 
objects X ('ate'), Y ('the apples'). 

Nested dependencies: the particular relationships between elements of a 
sentence; for example, in 'the starling the cats want was tired' - in an abstract 
form: a 1 a 2 b 2 b 1 -, a 1 ('the starling') matches up with b 1 ('was tired'), whereas a 2 
('the cats') matches up with b 2 ('want'). 

Phonology: the study of the abstract sound patterns of a particular language, 
usually according to some system of rules. 

Syntax: the rules for arranging items (sounds, words, word parts, phrases) into 
their possible permissible combinations in a language. 


1364-6613/$ - see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.Org/10.1016/j.tics.2012.12.002 Trends in Cognitive Sciences, February 2013, Vol. 17, No. 2 


89 







Review 


Trends in Cognitive Sciences February 2013, Vol. 17, No. 2 


Box 1. Syntactic song structures? 

Darwin [71] noted the striking parallels between birdsong learning 
and the acquisition of speech in human infants that appear to be 
absent in our closest relatives, the apes [63]. In both juvenile 
songbirds and human infants, individuals imitate the vocalizations 
of adults during a sensitive period early in life and they go through a 
'babbling' stage before they reach the adult form [63,70]. In addition, 
in both cases, the FOXP2 gene is involved in vocalization [63,72,73] 
and songbirds have brain regions that are analogous (and perhaps 
homologous) with human cortical regions involved in speech and 
language [63]. There is a dissociation in the songbird brain between 
regions mainly involved in vocal production and those involved in 
auditory perception and memory, similar to a functional dissocia¬ 
tion between Broca's area and Wernicke's area in the human brain 
[63,74]. Recently it was shown that songbirds have human-like left 
hemispheric dominance of these brain regions during birdsong 
learning [75,76]. 

Human language and birdsong both involve complex, patterned 
vocalizations, but does birdsong also have a human-like syntax? In 
human language, hierarchical structure can be assembled by 
combining words into higher-order phrases and entire sentences 
[3]. In birdsong, individual notes can be combined as particular 
sequences into syllables, syllables into 'motifs', and motifs into 
complete song 'bouts'. Variable song element sequences may be 
governed by sequential rules, what Marler [77] has termed 
'phonological syntax'. A recent suggestion that artificial language 
sequences such as A n B n can be learned by songbirds [78] has been 
demonstrated to rest upon a flawed experimental design [64]. 
Consequently, at present there is no convincing evidence to suggest 
that birdsong patterns can form (strictly) context-free languages or 
exhibit the hierarchical structure that characterizes human language 
[64,69]. 

genetic endowment for language appears to be fixed within 
the human species, as discussed in the following section. 
Because this underlying language genotype’ is fixed, it 
cannot be informative for phylogenetic analysis, which 
relies crucially on differences between species (here, lan¬ 
guages) for its basic data (Box 2). 

In the remainder of this article, we discuss these novel 
insights into the nature of language. After summarizing 
our views on the nature of language, we discuss the latest 
developments in the study of the neural mechanisms of 
language and evaluate recent evolutionary approaches. 

Human language has a shared computational core 

We turn first to characterizing human language. Perhaps 
the core question about language is: what is its basic 
‘design’? As with any biological subsystem, the extent to 
which this question can be answered is indicative of wheth¬ 
er one can tackle other basic questions, including how 
language is acquired and used, how the capacity for lan¬ 
guage evolved, how languages vary, and what the neural 
correlates of language are. 

One way to approach this question is as follows. The 
most elementary property of human language is that 
knowing some variety of, say, English, each speaker can 
produce and interpret an unbounded number of expres¬ 
sions, understandable to others sharing similar knowl¬ 
edge. Furthermore, although there can be four and five 
word long sentences, there can be no four and a half word 
sentences. In this sense, language is a system of discrete 
infinity [13]. It follows that human language is grounded 
on a particular computational mechanism, realized neural- 
ly, that yields an infinite array of structured expressions. 


Box 2. Language variation, language change, and 
evolutionary models 

Contemporary population genetics and computational phyloge¬ 
netics provide powerful new tools to model the origin, historical 
divergence, and geographic spread of languages, similar to 
biological species [8,79,80]. However, the assumptions behind 
biological phylogenetics do not always hold for language, so such 
methods remain controversial [81,82]. Linguistic variation and 
biological variation may not always be comparable and we lack 
good population-based models for human language change 
coupled with phylogenetic models [81,83]. Human languages share 
a fixed common core and differ only within a small, finite menu of 
structures and sounds that have remained frozen as far back as 
written records exist - unlike the unlimited variation possible for the 
molecular sequences that have revolutionized modern phyloge¬ 
netics. Such limits challenge phylogenetic methods because a 
language feature might appear many times in a single lineage, but 
there is no way to count how many and estimating evolutionary 
change becomes difficult. There is one exception: the number of 
words in a language is effectively unlimited. As a result linguistic 
phylogenetic analysis has generally proved more successful when 
applied to words [9]. Furthermore, the geographic contact of one 
language with another can result in the 'horizontal' transfer of traits 
from one language to another, creating a reticulated network rather 
than conventional branching trees. Here, too, special phylogenetic 
modeling is required, as with bacteria in biology [84]. Given these 
challenges, prominent researchers in the field argue that linguistic 
phylogenetic analyzes have not yet matured to the point that they 
'are capable of accurate estimation of language family trees' ([81], p. 
814) or that one can always disentangle the joint effects of change 
due to shared history from that due to shared geography [84]. 
Consequently, it remains to be seen whether these new tools will 
prove to have as dramatic an impact in linguistic analysis as they 
have in evolutionary biology. 


Each expression is assigned an interpretation at two ‘inter¬ 
faces’, as depicted in Figure 1, which envisions an abstract 
system block diagram for the language faculty. The first 
interface appears at the left side of Figure 1, a sensory- 
motor interface that connects the mental expressions 
formed by syntactic rules at the top of the figure to the 
external world, via language production and perception. 
The second, a conceptual-intentional interface, depicted on 
the right-hand side of Figure 1, connects these same men¬ 
tal expressions to semantic-pragmatic interpretation, rea¬ 
soning, planning, and other activities of the internalized 
‘mental world’. In this respect, language satisfies the tra¬ 
ditional Aristotelian conception as a system of sound with 
meaning [14]. 

As with other biological subsystems, such as vision, the 
ontogenesis of language (‘language acquisition’) depends 
on the interplay of three factors, familiar to biologists [15]: 
(i) the shared initial genetic endowment; (ii) external data 
(e.g., environmental stimuli, such as the language spoken 
to children); and (iii) general principles, such as the mini¬ 
mization of computational complexity, and external laws of 
growth and form. Factor (i) in turn has several components: 
(a) language- (and human-)specific components (often 
called ‘universal grammar’ [16,17]); (b) conditions imposed 
by the structure of the brain; and (c) other cognitive pre¬ 
conditions (e.g., a statistical analytical capacity). At a 
minimum this computational mechanism must be able 
to combine one linguistic representation (e.g., ‘ate’) with 
others (e.g., ‘the apples’), yielding new, larger linguistic 
objects (e.g., ‘ate the apples’). On a general level, therefore, 
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Figure 1 . The basic design of language. There are three components: syntactic rules and representations, which, together with lexical items, constitute the basis of the 
language system, and two interfaces through which mental expressions are connected to the external world (external sensory-motor interface) and to the internal mental 
world (internal conceptual-intentional interface). 


the computational mechanism for human language 
includes some operation that constructs new representa¬ 
tional elements Z from already-constructed elements X , Y. 
This operation can be called ‘merge’ [18]. 

Absent contrary evidence, we assume that this combi¬ 
natorial operation is as simple as possible, so that ‘merge’ 
takes just two arguments. The result of merge(X,Y) is 
therefore an (unordered) set of two elements {X, Y }, with 
X and Y unmodified. In our example, this would be simply 
the set {ate, the apples} (where ‘the apples’ must be further 
decomposed, a detail that we do not cover here). In turn, 
this suggests that wherever linear order appears in lan¬ 
guage, it is a reflection of the physical constraints imposed 
on the sensory-motor system’s input-output channel - 
words must be pronounced sequentially in time. For ex¬ 
ample, the plural of ‘apple’, ‘apples’, must be pronounced 
with the ‘s’ following ‘apple’, rather than the reverse, 
‘sapple’. Similarly, the words in a complete sentence must 
necessarily be pronounced one after another rather than 
simultaneously, thus giving rise to the various basic word 
order patterns in the world’s languages, such as Subject- 
Verb-Object order in English. The same holds for language 
perception, where listeners analyze sequentially ordered 
acoustic sequences. We will call the mapping from the 
internal linguistic representations to their ordered output 
versions ‘externalization’ (see Glossary). In marked con¬ 
trast, linear sequential order does not seem to enter into 
the computations that construct mental conceptual-inten¬ 
tional representations, what we call ‘internalization’ [12] . 
If correct, this calls for a revision of the traditional Aristo¬ 
telian notion: language is meaning with sound, not sound 
with meaning. One key implication is that communication, 
an element of externalization, is an ancillary aspect of 
language, not its key function, as maintained by what is 
perhaps a majority of scholars (cf. [19,20], among many 
others). Rather, language serves primarily as an internal 
‘instrument of thought’ [18] . 


Further, it should be evident that, although any two 
arbitrary syntactic objects, including words, may be 
merged, the result is not always meaningful at one or 
the other of the interfaces. For example, while the merge 
of ‘ate’ and ‘the apples’ results in a new, interpretable 
structured object, ‘ate the apples’, this is not always the 
case; combining ‘sleep’ and ‘the apple’, ‘sleep the apple’, 
results in a structured object that the conceptual interface 
rejects as malformed. 

What licenses some combinations but not others? Valid 
combinations work somewhat like the notion of electron 
donors and acceptors that form chemical bonds and so 
chemical compounds - for instance, an oxygen atom needs 
to accept two electrons, which are provided by two hydro¬ 
gen atom donors, to complete its orbital shell, forming the 
chemical compound H 2 0. Analogous to this, merged struc¬ 
tures act like chemical compounds: one property (or fea¬ 
ture) of a word such as ‘ate’ is that it requires something 
that is eaten, if only implicitly, here ‘the apples’ (the Object 
of the sentence). Additionally, considered as a predicate, 
‘ate’ can specify who is doing the eating (the Subject). Here, 
‘ate’ plays a role analogous to that of oxygen, requiring two 
‘electron donors’ (the Object and the Subject), whereas ‘the 
apples’ and, for example, ‘Charlie’ (the Subject) act like the 
hydrogen atom ‘donors’. In linguistic parlance, ‘ate’ is the 
kind of word that ‘probes’ (or seeks) a ‘goal’ with certain 
features - namely, the goal must be the kind of syntactic 
object that can be, for instance, an Object, such as ‘the 
apples’. 

But what should be the name of the newly created 
‘chemical compound’ formed by probe-goal assemblies such 
as ‘ate the apples’? In human language syntax, one can 
posit a labeling algorithm as part of the linguistic system 
itself: in a combination such as {ate, the apples} one element 
(‘ate’) is taken to label the newly-created compound. This 
representation distils much of what human language syn¬ 
tax requires for further syntactic computation: that ‘ate the 
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apples’ forms a new syntactic object, a phrase (known in 
conventional grammar as a Predicate Phrase or a Verb 
Phrase), and that this structure is labeled with the verb¬ 
like features of‘ate’, therefore having verbal properties, at 
least as far as linguistic syntax is concerned, as well as for 
any sound and meaning properties. This is so because the 
conceptual interface must know, for example, whether a 
syntactic object is a predicate or not, whereas the sensory- 
motor interface must know whether a word such as ‘pro¬ 
duce’ is a noun or a verb in order to assign to it proper 
pronunciation with the correct stress (if the word ‘produce’ 
is a verb, then its stress falls on the second syllable, 
proDUCE, whereas as a noun the stress falls on the first 
syllable, PROduce). Crucially, in the case of one syntactic 
object that is a lexical item, such ‘ate’, along with another 
that is a more complex syntactic object, such as ‘the apples’, 
then the labeling algorithm selects the lexical item (in our 
example, the verb ‘ate’) as the label for the newly composed 
syntactic object, rather than, say, both elements. 

In this sense, natural language phrases labeled with a 
lexical head (such as a verb, preposition, or adjective) plus 
some already-built phrase will exhibit the same charac¬ 
teristic structural pattern. Importantly, neural correlates 
and particular brain regions for this kind of structure¬ 
building have recently been discovered (see the following 
section). 

Operating freely, ‘merge’ results in a ubiquitous human 
language phenomenon, the apparent ‘displacement’ of 
phrases from their normal positions of semantic interpre¬ 
tation. Consider a sentence such as ‘Guess what he saw’. 
Oversimplifying, this sentence is produced by successive 
merge operations (forming ‘he saw what’, then ‘what he 
saw what’, and finally, ‘guess what he saw what’). What is 
actually spoken arises by deleting the embedded occur¬ 
rence of ‘what’, a simplification following the principles of 
factor (iii) above, reduction of computational complexity, 
yielding a sentence that is easier to pronounce because it 
contains only one copy of ‘what’. 

Unfortunately, the deletion of copies to make sentence 
production easier renders sentence perception harder, a 
fact familiar from the large literature on parsing human 
language [21]. For instance, in ‘Who is too stubborn to talk 
to Morris?’, ‘who’ must be interpreted as the Subject of 
‘talk’, the person who is too stubborn to talk. However, if we 
encounter ‘Who is too stubborn to talk to?’, then ‘who’ must 
instead be interpreted as the Object of‘talk to’. Here, ‘who’ 
does not appear where expected, that is, after ‘talk to’, but 
rather at the beginning of the sentence. 

Consequently, displacement results in a direct conflict 
between two competing functional demands: one that fol¬ 
lows the computational dictates of factor (iii) above and a 
second that follows a principle of communicative efficiency. 
The former prevails, apparently in all languages and all 
relevant structures [12], again supporting the conclusion 
that externalization (a fortiori communication) is ancillary 
to language design. 

The simplest version of ‘merge’ has many complex in¬ 
terpretive consequences, supporting the reality of the 
representations proposed above. Consider the examples 
in (i)-(iii), where we have left in place the copies that have 
been displaced in (iii): 


(i) they expect to see each other; 

(ii) guess which boys they expect to see each other; 

(iii) guess which boys they expect which boys to see each 
other. 

Typically, a word such as ‘each other’ seeks the closest 
possible word(s) it refers to, where ‘closest’ is determined 
by sentence structure, not the number of intervening 
words. That holds true in (i), where ‘they’ is closest to ‘each 
other’. However, in (ii) the word closest to ‘each other’, 
again ‘they’, is not selected as the antecedent of ‘each 
other’. Rather, the antecedent of‘each other’ is ‘which boys’. 
Evidently, what reaches the mind for interpretation is not 
the form (ii), but rather the expression (iii), where ‘which 
boys’ is indeed closest to ‘each other’, as predicted by merge 
in conjunction with the computational principle that seeks 
the closest possible antecedent. Numerous and far more 
intricate examples similar to these, ranging across many 
different languages, illustrate that ‘merge’ operates in the 
way suggested earlier [12,13] . For an explicit formalization 
of ‘merge’ and this model of syntax, see [22] . 

In this way, much of the apparent complexity of lan¬ 
guage flows from externalization, with variation from one 
language to the next corresponding to different solutions to 
the way that internal syntactic representations ‘surface’ as 
sentences. These are precisely the aspects of language 
readily susceptible to variation and historical change, 
where models drawn from evolutionary biology have a role 
to play in accounting for language variation (Box 2). 
Whereas learning English requires acquiring from exter¬ 
nal experience the particular details for English sounds, 
word formation, word order, and the like, no individual 
needs to learn constraints such as those exhibited by 
examples (i)-(iii), which apply in all languages, apparently 
without exception. These come to us ‘from the original 
hand of nature,’ in David Hume’s phrase [23] - derived 
from the human genetic endowment and its language- 
specific components, as well as from general computational 
principles. 

Language, words, and evolution 

The computational procedure sketched above must include 
a set of atomic elements that are unanalyzable for the 
purposes of the computation - though, like atoms, they 
may be analyzable in different terms. For the core compu¬ 
tations of language, this collection is called the ‘lexicon’, a 
set of roughly word-like elements. Although essential for 
language, these elements raise serious challenges for evo¬ 
lutionary analysis, rarely discussed, for one reason because 
they appear to be radically different from anything found 
in animal communication. 

As an example of this gap, Laura-Ann Petitto, one of the 
leading researchers of primate communication and early 
language acquisition, observes that a chimpanzee uses the 
label for ‘apple’ to refer to ‘the action of eating apples, the 
location where apples are kept, events and locations of 
objects other than apples that happened to be stored with 
an apple (the knife used to cut it), and so on and so forth - 
all simultaneously, and without apparent recognition of 
the relevant differences or the advantages of being able to 
distinguish among them’ ([24], p. 86) 
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By sharp contrast, she continues, for human infants 
even the first words ‘are used in a kind-concept constrained 
way (a way that indicates that the child’s usage adheres to 
“natural kind” boundaries)’. Even after years of training, a 
chimpanzee’s usage ‘never displays this sensitivity to dif¬ 
ferences among natural kinds. Surprisingly, then, chimps 
do not really have “names for things” at all. They have only 
a hodge-podge of loose associations’ ([24], p. 86). This is 
radically different from humans. 

A closer look shows that humans also do not have 
‘names for things’ in any simple sense. Even the simplest 
elements of the lexicon - ‘water’, ‘tree’, ‘river’, ‘cow’, ‘per¬ 
son’, ‘house’, ‘home’, etc. - do not pick out (‘denote’) mind- 
independent entities. Rather, their regular use relies cru¬ 
cially on the complex ways in which humans interpret the 
world: in terms of such properties as psychic continuity, 
intention and goal, design and function, presumed cause 
and effect, Gestalt properties, and so on. It follows that the 
meanings of even the simplest words depend crucially on 
internal cognitive processes and cannot be spelled out in 
strictly physical terms. Human words and concepts differ 
sharply from those in the rest of the animal world in just 
about every relevant respect: their nature, the manner of 
their acquisition, and their characteristic use. 

What is true of simple words becomes far more myste¬ 
rious when we move to more complex concepts or to acqui¬ 
sition of language under conditions of sensory limitation, 
for example, acquisition of language by the blind, who 
readily achieve exquisite understanding of words for what 
seeing individuals perceive, as Landau and Gleitman have 
shown [25] . Or, to take another example of Gleitman’s to 
illustrate the remarkable feats of language acquisition, 
consider ‘such words as fair (as in “That’s not fair!”), a 
notion and vocabulary item that every child with a sibling 
learns quickly, and in self-defense’ ([26], p. 25) - and a 
concept of considerable subtlety, a centerpiece of contem¬ 
porary moral philosophy. As she and others have shown, 
that barely scratches the surface. Not only are the mean¬ 
ings of words intricate, far beyond any evidence available 
to the child, but they are also learned with amazing 
rapidity, approximately one per waking hour at the peak 
period of language acquisition. 

Such facts pose extremely hard and crucial questions 
both for the study of acquisition of language and evolution 
of the human language capacity. Note that, as in the case of 
human language syntax, the usual tools for evolutionary 
analysis, the comparative method, cannot be applied, in an 
even more radical sense. Whereas analogies between hu¬ 
man words and primate vocal calls have sometimes been 
drawn (see, e.g., [27] on vervet monkeys), it has become 
more apparent over time that if the minds of these crea¬ 
tures really had a human-like capacity for expression, then 
there should be no acoustic barrier to stop at just a handful 
of calls, yet that is what Seyfarth and Cheney [27] ob¬ 
served. Furthermore, there seems to be no vocal learning, 
so even if a new call was introduced in a group, accurate 
reproduction seems impossible. Moreover, such calls lack 
key properties of human words: no abstractions and no 
‘displacement’ - calls remain linked to what monkeys are 
presently experiencing (exactly as with the chimpanzee 
use of the item ‘apple’ cited by Petitto earlier). Taken 


together with the apparent absence of ‘symbolic behavior’ 
in the closest relative extinct species of Homo [2] , there is 
scant evidence on which to ground an evolutionary account 
for words. 

Human language has a fixed neural architecture 

Recent technical advances in neuroimaging have greatly 
increased our understanding of these language-related 
processes in the human brain. Natural language and arti¬ 
ficial grammar studies have made it possible to determine 
the neural bases of processing hierarchically structured 
sequences. Results from studies of artificial grammar 
learning across species strikingly parallel the distinctions 
in linguistics between the structures that are characteris¬ 
tic of natural language and those structures involved in 
other kinds of cognitive processes. 

The study of the neural basis of language must consider 
those parts of the brain that represent the core computa¬ 
tions which are thought to be universal, as well as those 
which constitute the interface systems that may vary 
across individuals, as these interfaces rely on individual 
working memory, reasoning, and conceptualization abili¬ 
ties (Figure 1). At the neural level, core computations may 
be differentiable from a sensory-motor interface and a 
conceptual system. Each of these systems consists of par¬ 
ticular brain regions connected via specific fiber tracts 
forming a neural network. In this context, two different 
dorsally located pathways have been identified, one involv¬ 
ing Brodmann area (BA) 44 and the posterior superior 
temporal cortex (pSTC) that supports core syntactic com¬ 
putations [28] and one involving the premotor cortex 
(PMC) and the STC that subserves the sensory-motor 
interface [29]. There are also ventrally located pathways 
which involve brain regions that support semantic process¬ 
es. These are BA 45 in the inferior frontal cortex and 
portions of the temporal cortex (for discussion, see [30]; 
Figure 2). These networks will be specified below. 

Neural mechanisms for syntax and hierarchical 
structures 

Human language contains hierarchical structures that are 
a product of multiple ‘merge’ operations. It has long been 
shown that the processing of hierarchically complex sen¬ 
tences involves Broca’s area, in particular, the pars oper- 
cularis (BA 44) in the inferior frontal gyrus (IFG; Figure 2) 
(for a review, see [1]). Recent artificial grammar studies 
investigating key differences between animals and 
humans [28,31,32] have often used two types of strings: 
one of the format (AB) n (Figure 3a) and one of the format 
A n B n (Figure 3b). The processing of A n B n sequences acti¬ 
vates Broca’s area (BA 44), whereas (AB) n sequences acti¬ 
vate the frontal operculum [28], a phylogenetically older 
cortical area than Broca’s area [33,34]. Note that A n B n 
sequences could, in principle, be processed without neces¬ 
sarily building hierarchically structured representations 
at all, by using a counting mechanism along with working 
memory that checks whether the same number of Bs follow 
the As [35]. Such a process could in principle be at work in 
animals and humans. Interestingly, in humans Broca’s 
area (BA 44) has been found to be activated for the proces¬ 
sing of A n B n sequences [28] and for the processing of 
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Figure 2. Language-related regions and fiber connections in the human brain. Displayed is the left hemisphere. Abbreviations: PMC, premotor cortex; STC, superior 
temporal cortex; p, posterior. Numbers indicate cytoarchitectonially defined Brodmann areas (BA). There are two dorsal pathways: one connecting pSTC to PMC (dark red) 
and one connecting pSTC to BA 44 (blue). Moreover, ventral pathways connecting BA 45 and the ventral inferior frontal cortex (vlFC) to the temporal cortex (TC) have also 
been discussed as language-relevant. 



Figure 3. Artificial strings and natural grammars, (a) Strings of the format (AB) n , in which each A-category item is followed by a B-category item, (b) Consecutive sequences 
of equal numbers of A-category items followed by B-category items can be recognized without necessarily building hierarchical structure, by simply verifying that the 
number of A-category members to the left match the number of B-category members to the right. Such sequences can also be learnt by songbirds (Box 1 ). (c) By contrast, 
natural language structures are always hierarchical and must be processed as such. 


complex hierarchical structures in natural languages 
(Figure 3c) [36-38] . In an elegant study by Moro and collea¬ 
gues [39], German native speakers successfully learned 
either ‘real’ or ‘unreal’ grammatical rules of different lan¬ 
guages (Italian or Japanese). In the ‘unreal’ versions of the 
unfamiliar language, the same lexicon was used as in the 
‘real’ versions, but the sentences violated the rules of univer¬ 
sal grammar. For instance, in a ‘real’ sentence, a literal 
translation of ‘I eat the pear’ from Italian is ‘Eat the pear’. 
An example of an ‘unreal’ negating sentence is one where the 
negative particle is placed after the third word, which does 
not happen in any natural language. Such an Italian negat¬ 
ing sentence in English is ‘Paolo eats the no pear’. Using 
fMRI, the authors found that increased activation over time 
in Broca’s area during the learning task was specific for ‘real’ 
language that observed the principles of universal grammar, 
independent of the language used. These findings again 
suggest a role for Broca’s area in the processing of syntax. 
Importantly, the participants were able to learn the ‘unreal’ 
grammatical rules, as well as the ‘real’ ones, but, apparently, 
other brain regions were activated in the process, apart from 
Broca’s area, which suggested that language can be neurally 
dissociated from other cognitive capacities. 

Natural sentence processing, in contrast to artificial 
grammar processing, involves the posterior superior 


temporal cortex (STC) in addition to BA 44 as part of 
Broca’s area, to which it is connected via the arcuate 
fascicle (AF) and parts of the superior longitudinal fascicle 
(SLF) (Figure 2). 

The finding that the processing of natural syntactically 
complex sentences involves the posterior STC in addition 
to Broca’s area, in particular BA 44 [40,41], whereas the 
processing of artificial grammar sequences only involves 
Broca’s area [28] , suggests that within this network BA 44 
supports complex structure-building, whereas the integra¬ 
tion of syntactic information and semantic information to 
achieve sentence interpretation is subserved by the poste¬ 
rior STC. This dorsal connection between BA 44 and the 
STC supports the processing of syntactically complex sen¬ 
tences [42,43]. Evidence for the relevance of the dorsal 
connection between BA 44 and the posterior STC for the 
interpretation of syntactically complex sentences comes 
from studies showing that, if this fiber tract is not fully 
matured [42] or not intact [43] , processing such sentences 
is deficient. 

In humans, there is an additional dorsal pathway that 
connects the auditory sensory regions in the STC with the 
premotor cortex (PMC) in the precentral gyrus [44-46]. In 
contrast to the other dorsal pathway, this second neural 
circuit is present in the infant brain at birth and remains 
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unchanged throughout life [47] (Figure 3). In adults this 
pathway is involved in oral repetition of speech [29] and in 
infants this sensory-to-motor mapping circuit appears to 
support phonology-based language learning demonstrated 
in infants during their first months of life [48,49] . Thus, 
although this pathway allows the detection of phonologi- 
cally-coded rules in itself, this circuit is not sufficient to 
process the structure built by human grammars. 

Thus, during ontogeny the dorsal connection between 
STC and the PMC is present at birth and probably sup¬ 
ports auditory-based phonological learning during early 
infancy [48,50] - one component of the process of external- 
ization. The full maturation of the dorsal connection be¬ 
tween BA 44 and the STC, which only seems to happen 
around the age of 7 years [40] , appears to be necessary to 
process syntactically complex sentences [51]. 

Neural mechanisms for processing meaning 
The question of how the human brain achieves meaning 
assignment has been investigated at different levels: at the 
single word and at the sentence level. Many studies have 
investigated meaning at the word level (for a review, see 
[52]), but only few of these studies considered the fact that 
lexical-semantic and conceptual-semantic aspects during 
word processing are not easily distinguishable. Within this 
context, the anterior temporal cortex has been discussed as 
a region that represents semantic-conceptual knowledge 
independent of sensory, motor, and language aspects, 
which in turn are represented in other parts of the cortex, 
with words recruiting the inferior frontal and superior 
temporal cortex in particular [53] . 

Beyond the level of single words, a significant number of 
neuroimaging studies have focused on meaning assignment 
during sentence processing, but because this process 
involves inference, semantic-conceptual knowledge, and 
reasoning, the localization of its neural substrates is more 
variable across individuals and therefore more difficult to 
assess. Many researchers have approached the processing of 
meaning empirically by comparing normal sentences to so- 
called scrambled sentences or word lists containing pseudo¬ 
words. These studies mainly found activation in the pars 
orbitalis (BA 47) and the pars triangularis (BA 45) in the 
inferior frontal gyrus (IFG) and the anterior temporal cortex 
(for a review of these studies, see [1] ). Recently, BA 45/47 has 
been described as being domain-specific for language [54] or 
as correlating with the size of linguistic constituents in 
particular [55] . In the latter study, regions in the temporal 
pole and anterior STC were activated in proportion to the 
size of the constituents only when they contained lexico- 
semantic information, which suggests that these regions are 
involved in semantic encoding [55]. Others have compared 
the processing of sentences with implausible and plausible 
meanings and found that BA 45 and BA 47 were activated as 
a function of implausibility and the anterior and posterior 
superior temporal cortex were activated as a function of 
plausibility (for a review, see [55]). 

These inferior frontal and temporal regions are con¬ 
nected via ventral pathways which, however, are hard to 
differentiate neuroanatomically because they run in close 
vicinity when passing the insular cortex [56,57] (Figure 2). 
Within this ventral network, IFG activation is argued to 


reflect semantic competition and controlled semantic pro¬ 
cesses, such as judgment and categorization, both at the 
word-level [58,59] and sentence-level [60]. Activations in 
the temporal cortex are reported for the anterior, as well as 
the posterior portion. The anterior temporal cortex has 
been associated with semantic combinatorical processes 
[61] , whereas the posterior STC has been argued to support 
the integration of semantic information provided by more 
anterior temporal regions and syntactic information pro¬ 
vided by Broca’s area via the dorsal pathway [30] . Patient 
studies indicate that the ventrally located system is crucial 
for language comprehension [62] . It may reflect aspects of 
the internal interface, such as the retrieval and manipula¬ 
tion of semantic information. 

In sum, neuroimaging studies suggest that, in addition 
to a sensory-to-motor mapping system, there are at least 
two other language-relevant systems at work in the adult 
human brain: a dorsal and a ventral language system. 
First, the dorsal system involves Broca’s area (in particular 
BA 44), which supports core syntactic rule-based compu¬ 
tation of hierarchical structure building and which, togeth¬ 
er with the posterior temporal cortex, subserves the 
comprehension of complex sentences. Second, the ventral 
system, which involves BA 45/47 and the temporal cortex, 
supports the processing of lexical-semantic and conceptual 
information. To what extent these two systems represent 
the assumed external and internal interfaces must be 
evaluated in future studies. 

Language evolution 

There is no equivalent to human language in other animal 
species [3], which poses a challenge for the mainstay of 
evolutionary explanation, the comparative method. Typi¬ 
cally, evolutionary biologists examine species whose last 
common ancestor with humans is ancient, in order to 
search for evidence of convergent evolution, or conversely, 
species whose last common ancestor with humans is rela¬ 
tively recent, in order to search for features of shared, 
common descent with modification [63] . 

Evidence of convergent evolution 

Songbirds provide an illustrative example of the former 
case. Songbirds are capable of sophisticated auditory 
learning and perception and of vocal production, in certain 
critical ways mirroring the developmental acquisition and 
production of human speech, even with analogous brain 
circuitry [63] (Box 1). However, speech is only an external- 
ization of the internal representations of language as 
depicted in Figure 1, which limits the comparative power 
of the songbird model. Furthermore, songbirds lack two 
essential ingredients of human language: first, the link 
between word-structured sentences and distinct meanings; 
and second, the ability to process the hierarchical struc¬ 
tures typical of natural language [3,35,64], as described in 
the previous section (Figure 3). 

Shared , common descent with modification 
Turning to the case of common descent and more closely 
related species, in primates, comparative phylogenetic 
studies of macaque, chimpanzee and human brains reveal 
fiber tract differences, in particular with respect to the 
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dorsal pathway that connects language-relevant areas in 
humans noted in the previous section. The dorsal pathway 
that connects Broca’s area (BA 44) and Wernicke’s area in 
STC undergoes considerable phylogenetic change: it is 
weak in non-human primates, but strong in humans 
[65]. Moreover, cross-species comparative studies on lan¬ 
guage learning reveal important differences in grammar 
processing, in particular for hierarchical structures. Com¬ 
parisons between monkeys and humans indicate that 
monkeys can learn adjacent dependencies in (AB) n strings 
but not non-adjacent dependencies in A n B n strings, where¬ 
as humans easily learn both [31]. Here, in non-human 
primates, the evidence is equivocal, since for small n the 
(AB) n and A n B n patterns can both be learned simply by 
counting matching A’s and B’s. Whereas the processing of 
A n B n strings recruits Broca’s area (BA 44) the processing of 
(AB) n strings relies on a phylogenetically older cortex, the 
frontal operculum [28,33]. 

Taken together, the evidence on birds and primates 
suggests that three factors are important in the evolution 
of speech and language. First, there is neural and genetic 
homology: similar genes and brain regions are involved in 
auditory learning and vocal production, not only in song¬ 
birds and humans, but also in apes and monkeys. Second, 
there is evolutionary convergence with regard to the mech¬ 
anisms of auditory-vocal learning, which proceeds in es¬ 
sentially the same way in songbirds and human infants, 
but not in apes or monkeys. Third, the combinatorial 
complexity of human language is unique in the animal 
kingdom [3,35,64]. It may be that the neural mechanisms 
that evolved from a common ancestor, combined with the 
auditory-vocal learning ability that evolved in both 
humans and songbirds, contributed to the emergence of 
language uniquely in the human lineage. 

Concluding remarks and future directions 

The discussion regarding the cognitive capacities particu¬ 
lar to human language as opposed to those found across 
many other animal species has shifted radically in recent 
years, not only in the domain of cognitive neuroscience, but 
also in linguistic theory. Over the past 60 years, linguistic 
theory has consistently sought to reduce what cognitive 
properties are human-language specific, moving more in¬ 
stead into the realms of general animal cognition or bio¬ 
physical constraints. Perhaps the most dramatic reduction 
has been in the intricacy of the assumptions and stipula¬ 
tions required to formulate the linguistic grammars of the 
early 1950s [66] - drawing on complex Boolean rule con¬ 
ditions, rules, specific rule orderings, language-particular 
features, and similar devices. This has given way to a far 
simpler set of basic principles, in much the same way that 
the descriptively adequate, but overly-complex epicycle 
account of planetary motion was subsumed under Kepler’s 
and Newton’s handful of laws. If this work is on the right 
track, in effect only the simple ‘merge’ system plus words 
remain uniquely human, although too much at present is 
not understood to be confident about this bold conclusion. 

From this standpoint, it is no surprise that researchers 
demonstrate with some regularity that, in the domain that 
we have called ‘input-output’ systems externalization, non¬ 
human animals can engage in such tasks as musical 


rhythmic entrainment [67] or perception of degraded 
speech [68], formerly thought to be the sole province of 
humans. The realization that fewer aspects of language 
externalization are human-specific than previously 
thought has greatly improved the prospects for using ani¬ 
mal models to understand this particular interface and has 
sharpened our ability to pinpoint neural mechanisms that 
in fact are human language specific. To be sure however, 
striking differences highlighted nearly sixty years ago 
remain: human reliance on sophisticated structure-build¬ 
ing to assemble an unbounded array of interpreted expres¬ 
sion, unlike the bounded call systems of any non-human 
animal. 

To the extent that modern linguistic theory has revealed 
the underlying properties of language, it would seem ap¬ 
propriate to use these properties in future experimental 
probes of both non-human and human competences related 
to language, as well as to more nuanced accounts of lan¬ 
guage use and change. Similarly, the study of language 
historical change and phylogenetics must carefully distin¬ 
guish between the fixed properties of human language and 
those that vary from language to language, perhaps cul¬ 
turally. Formulating accurate evolutionary analogues for 
language change seems key; here, unifying single-lan¬ 
guage population models with the cross-linguistic phylo¬ 
genetics used so far would seem to be a crucial step. 

Animal models for human language should move away 
from tests associated with the more superficial, external 
aspects of human language, such as simple A n B n strings, 
and instead probe for the hierarchical sequential struc¬ 
tures described by linguistics and with known neural 
correlates, essential to language. Rather than non-human 
primates, songbirds and parrots are the most relevant 
animal models to study the neural mechanism of audito¬ 
ry-vocal learning and the production of structured vocali¬ 
zations [63,64,69]. Convergent evolution of neural 
mechanisms underlying speech and birdsong suggests that 
there are optimal neural solutions to problems of auditory- 
vocal learning. Animal research thus has important heu¬ 
ristic value for the study of human speech and language 
and its disorders. 

Regarding the neural mechanisms of human language, 
research should focus on distinguishing neural networks 
supporting the externalization of language from those 
engaged in core syntactic computations, such as ‘merge’. 
Moreover, direct comparisons of language processing, as 
well as language learning, in the developing brain and in 
the mature brain should be more systematically considered 
as a window to the neurobiological basis of human lan¬ 
guage. 

Recent developments in both animal and human re¬ 
search and comparisons between these suggest a novel 
approach to the study of language evolution. Of course, 
evolution in and of itself cannot explain the complete 
nature of language [70], but contemporary analyzes sug¬ 
gest that we need to rethink language evolution to begin 
with. First, regarding human-animal similarities in the 
domain of auditory-vocal learning, the fact that evolution¬ 
ary convergence has been found to be more important than 
common descent has important consequences for the 
evolution of these capabilities [63]. Second, as we have 
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discussed, there are crucial differences between humans 
and any non-human species in terms of syntactic capabili¬ 
ties [3,64] that constrain evolutionary analyzes. Only then 
can we begin to understand the nature of language and its 
underlying neural mechanisms. 
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